Introduction Recent advances in artificial intelligence—particularly in Natural Language Processing via Large Language Models (LLMs)—are enabling the development of reliable educational tools in hematology. When trained on validated medical sources and supervised by domain experts, these systems can enhance clinical understanding, support the interpretation of complex cases, and contribute to continuous professional development. Their use is increasingly accepted among specialists as a complement to traditional medical education.

Aim To describe the development, deployment, and validation of generative AI systems based on large language models (LLMs) with retrieval-augmented generation (RAG) for delivering domain-specific educational support, with a focus on malignant and benign hematologic disorders.

Material or patients and method Code Red domain-specific generative AI assistants are developed by physician teams who define a focused medical knowledge domain and curate a closed, validated text corpus. Each assistant operates on OpenAI GPT-4o or GPT-4.1 with retrieval-augmented generation (RAG) and a bespoke system prompt, refined iteratively with subject matter experts. Following public release, integrated voting and comment tools collect real-time user feedback, enabling continuous updates by a medical oversight group. All responses are fully referenced and downloadable as PDF. The platform is natively multimodal, supporting speech input/output, image and PDF parsing, and context-aware reasoning across modalities. These capabilities are designed to enrich advanced medical education, support interpretation of clinical content, and aid in the understanding of complex diagnostic and therapeutic concepts—always under expert supervision and without replacing clinical judgment.

Results Each Code Red assistant is developed by one to two physicians following a structured review of validated scientific literature and is released only after expert clinical validation. By mid-2025, the platform hosts 230 publicly accessible assistants covering diverse medical domains, freely available through www.codigorojo.tech.

Hematology represents one of the most developed areas: 22 assistants are dedicated to malignant hematologic conditions, including 10 focused on specific disease entities such as acute leukemias, chronic lymphocytic leukemia, lymphomas, multiple myeloma, as well as myeloproliferative and myelodysplastic syndromes. Six modules address supportive diagnostic or prognostic domains such as hematological diagnosis, frailty assessment, and immunodeficiency-associated infections. An additional six assistants focus on therapeutic management and associated toxicities, including cell therapy, CAR-T–related complications (CRS, ICANS), tumor lysis syndrome, and cardio-oncology. In benign hematology, 13 assistants are available. Eight of them address disease-specific content such as hemophilia (including gene therapy in hemophilia B), platelet disorders, congenital and acquired coagulopathies, and anemia. The remaining five support transfusion medicine, anticoagulation, massive bleeding management, and interpretation of diagnostic tools such as ROTEM and basic coagulation tests.

Since the platform's launch in November 2024, the assistant interface (https://chatbot.codigorojo.tech) has registered 273,831 visits. Registered users predominantly connected from Spain (58%), followed by the United States, the Netherlands, Mexico, and Ireland. Access is possible via desktop or mobile, with a dedicated mobile app also available. A significant proportion of users interact anonymously, underscoring the platform's open-access, educational nature and international reach.

Conclusions Code Red is among the first multimodal generative AI platforms designed specifically for clinical education. Its successful application in hematology underscores the value of domain-specific, expert-supervised systems in advancing specialized medical training. An independent benchmarking process is underway across specialties to assess performance and reliability. Future developments will focus on orchestrating outputs across domains using reasoning-capable agent models and integrating parallel self-evaluation systems to ensure sustained quality and clinical relevance.

This content is only available as a PDF.
Sign in via your Institution